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Abstract: In this paper, we propose a fixed-complexity spliere en- 
coder (FSE) for multi-user MIMO (MU-MIMO) systems. The pro- 
posed FSE accomplishes a scalable tradeoff between performance 
and complexity. Also, because it has a parallel tree-search struc- 
ture, the proposed encoder can be easily pipelined, leading to a 
tremendous reduction in the precoding latency. The complexity of 
the proposed encoder is also analyzed, and we propose two tech- 
niques that reduce it. Simulation and analytical results demon- 
strate that in a 4 X 4 MU-MIMO system, the proposed FSE requires 
only 11.5% of the computational complexity needed by the conven- 
tional QRD-M encoder (QRDM-E). Also, the encoding throughput 
of the proposed encoder is 7.5 times that of the QRDM-E with tol- 
erable degradation in the BER performance, while achieving the 
optimum diversity order. 

Index Terms: Multi-user MIMO systems, precoding, QRD-M en- 
coder. Sphere encoder, Tomlinson-Harashima precoder. 



I. INTRODUCTION 

Multi-user multiple-input multiple-output (MU-MIMO) tech- 
niques are becoming increasingly important as the base station 
(BS) should have the capability to simultaneously communicate 
with a large number of users. To this end, dirty paper coding 
(DPC), which is shown to achieve the capacity region of the 
Gaussian MIMO broadcast channel |[T], has been proposed by 
Costa 

Several MU-MIMO precoding schemes were proposed in the 
literatures in order to achieve the near-capacity. Linear zero- 
forcing (ZF) precoding was introduced in pl, where the trans- 
mitted vector is pre-filtered using the pseudo-inverse of the 
channel matrix. As a consequence, a high transmission power is 
required, particularly when the channel matrix is ill-conditioned. 
To overcome this problem, regularized channel inversion, i.e., 
linear minimum mean square error (MMSE) precoder, was pro- 
posed to reduce the required transmission power of the ZF pre- 
coding scheme, while achieving a tradeoff between interference 
and noise amplification [S). Moreover, Tomlinson-Harashima 
precoding (THP) scheme achieves better performance by limit- 
ing the transmit power, via a non-linear modulo operation [31 , 
||5|. Also, it is shown that the coding loss due to the modulo 
operation vanishes for high-order modulation schemes. Trans- 
mit power can be further reduced by perturbing the transmit- 
ted vector, as in ||6l, where the optimum perturbation vector is 
found using the sphere encoder (SE). Although SE has a small 
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average computational complexity, its worst-case complexity is 
very high. Besides, SE is sequential in the tree-search phase, 
which limits the potential for efficient hardware implementa- 
tion. To overcome the random complexity of the SE, the QR- 
decomposition with M-algorithm encoder (QRDM-E) was pro- 
posed in [(TJ. The main idea of the QRDM-E is to retain a fixed 
number of candidates at each encoding level. Although QRDM- 
E achieves the same performance of the SE, its complexity is 
high and its tree search strategy limits the possibilities for the 
efficient hardware implementation using pipelining. 

In this paper, we propose a fixed-complexity sphere encoder 
(FSE) based on the fixed-complexity sphere decoder |8j, that 
achieves the optimum diversity order of the QRDM-E, as well as 
a flexible tradeoff between performance and complexity. More- 
over, because the proposed FSE has a parallel search structure, 
it can be pipelined, which tremendously reduces the precoding 
latency. In addition, we evaluate the complexity of the conven- 
tional schemes and the proposed FSE and introduce two tech- 
niques that reduce its complexity. 

The rest of this paper is organized as follows. In Section II, 
we introduce the system model and the statement of problem. 
In Section III, we review the conventional vector-perturbation 
techniques. In Section IV, we introduce the proposed FSE and 
in Section V we analyze its complexity and those of the SE and 
QRDM-E. Two proposed methods to reduce the complexity of 
the FSE are introduced in Section VI and simulation results are 
shown in Section VII. Finally, conclusions are drawn in Section 
VIII. 



II. SYSTEM MODEL AND THE PROBLEM 
STATEMENT 

We consider a downlink MU-MIMO transmission system in 
which a BS with Nt transmit antennas communicates simultane- 
ously with Nu decentralized single-antenna users. Without loss 
of generality, we assume that N = Nt = Nu- Also, we consider 
a flat-fading and slowly time-varying channel, whose state in- 
formation is perfectly known at the transmitter, if not otherwise 
mentioned. Then, the system is converted to the ii'-dimensional 
real lattice problem, where K = 2N. 

Let H e R^^-^ denote the channel matrix, and s e de- 
note the data vector 

Linear precoding techniques are the simplest where in the 
case of linear zero-forcing precoding (LZF) the effect of the 
channel is canceled by precoding the transmitted data vector us- 
ing the pseudo-inverse of the channel matrix. 
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where the scaling factor 7 is present to fix the expected total 
transmit power to (Pt); that is. 
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where Tr( ) refer to the trace operation. As a consequence, the 
receive SNR at any MS is given by: 



SNR = 



E(ss*) 



(3) 



If the channel matrix is ill-conditioned, 7 becomes large and 
consequently the post-processing signal to noise ratio (SNR) is 
decreased. To partially overcome this drawback, linear mini- 
mum mean-square error (MMSE) precoding can be used to reg- 
ularize the channel matrix. The precoded signal using LMMSE 
is therefore given by; 
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where a = Ka^JPr is the regularization factor. Although the 
LMMSE precoder reduces the required transmit power, i.e., re- 
duces 7, its performance is still mediocre and further improve- 
ment can be therefore achieved. 

Unlike the LMMSE precoder which regularizes the channel 
matrix, the non-linear THP algorithm works on the data vector s 
so that the required transmit power is reduced ID, IS). Hence, a 
linear representation of the THP algorithm can be seen as find- 
ing the perturbed vector 



Tt, 



(5) 



such that the required transmit power of the precoded vector is 
reduced. In (|5]l, r is an integer that depends on the employed 
modulation scheme, and t is a AT-dimensional integer vector. In 
im, T is given by: 
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where |c|ma2: is the absolute value of the constellation point with 
the largest magnitude, and A is the spacing between the constel- 
lation points. Note that THP algorithm finds the elements of t 
in a successive way, where the t candidate that minimizes the 
required transmit power at each encoding level is retained. This 
is equivalent to the successive interference cancellation in the 
signal detection literature. 

Although THP algorithm reduces the required transmit power 
compared to the linear precoding schemes, better performance 
can be obtained by optimally perturbing the transmit vector so 
that further reduction in the transmit power is obtained. The vec- 
tor perturbation can be represented as an integer-lattice search, 
where at the transmitter, t is chosen such that 7 is minimized; 
that is. 



t = argmin |(s + Tt)'^P'^P(s + rt)} , 



argmin ||P(s + rt) 



(7) 



Let the transpose of the matrix H be factorized into the product 
of a unitary matrix Q and an upper triangular matrix R, thus, 
the search problem in (|7]i based on the zero-forcing criterion is 
simplified to: 

t = argmin ||L(s + Tt)||^ , 
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where the lower triangular matrix L equals (R ) . When the 
MMSE criterion is used, the extended matrix H = [H^ V^I]^ 
is factorized into the Q and R matrices, where L also equals 

(R~^) . Due to the QR-decomposition property ifTOl 
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it holds that R~^ = Q2/\/c^ LlQl- By definition ^Ja is a strictly 
positive real number, then it does not affect the search result in 
(O. Therefore, L = also leads to the required perturbation 
without the need for explicitly inverting R. 

In this paper, tk is selected from the symmetric integer set: 



A = [—a, —a + 1, 



a — 1, a] 



(10) 



where a is a positive integer chosen to achieve a tradeoff be- 
tween performance and complexity of the vector-perturbation 
algorithm. Therefore, as a increases, the bit error rate (BER) 
performance is improved but the complexity is also increased, 
and vice- versa. Hereafter, T ~ (2a + 1) denotes the number 
of elements in the set A. Note that a will be optimized using 
extensive simulation. 



III. REVIEW OF THE CONVENTIONAL 
VECTOR-PERTURBATION TECHNIQUES 

A. Sphere Encoder 

The idea of the SE is to limit the search to the vectors t which 
resides in a hypersphere with a predefined radius. Therefore, 
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where d is the predefined search radius. The search radius is up- 
dated when a vector t with smaller accumulative metric is found. 

The main drawbacks of the SE is that it has a random com- 
plexity that can be inapplicable at the worst-case. Also, SE has a 
sequential tree-search phase which limits the efficient hardware 
implementation. 

B. QR-decomposition with M-algorithm Encoder ( QRDM-E) 

In QRDM-E, the best AI branches that have the least accu- 
mulative metrics are retained at each encoding level. The value 
of M is used to set a tradeoff between performance and com- 
putational complexity, where to accomplish a fair comparison 
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with the FSE, M is set to T. Therefore, at the first tree-search 
stage, the best M branches are retained for level 2. At level 2, 
the retained branches are expanded to all possible combinations 
of (s2 +Tffe). The resulting A/^ branches are sorted according to 
their accumulative metrics calculated via where only the M 
branches with the smallest accumulative metrics are retained for 
level 3. This strategy is repeated up to the last encoding level, 
where the perturbed vector s that has the smallest accumulative 
metric is precoded and transmitted. 

The QRDM-E algorithm has a fixed complexity that is in- 
dependent of the noise variance or the channel conditioning. 
Nonetheless, its complexity highly increases for high K and T. 

IV. PROPOSED FIXED-COMPLEXITY SPHERE 
ENCODER 

The proposal of the FSE is motivated by: 

• Increasing the encoding throughput: The encoding 
throughput of the QRDM-E is fixed which is favorable for com- 
munication systems. Nevertheless, this encoding throughput is 
low due to the limited possibilities for efficient hardware im- 
plementation of the QRDM-E. This is mainly due to the search 
strategy employed at the QRDM-E where high number of met- 
rics are compared at each encoding level. Our proposed FSE has 
a high encoding throughput due to its parallel tree search phase. 

• Decreasing the complexity: The complexity of the QRDM- 
E significantly increases for high K and T. Also, the SE has a 
high worst-case computational complexity. 

To overcome these drawbacks of the conventional encoding 
schemes, the FSE is proposed in this paper 

The tree-search phase of the proposed FSE algorithm consists 
of the following two steps: 

• FuU expansion: At the first p tree search levels, the retained 
branches are expanded to all possible nodes, and all the result- 
ing branches are retained for the next level. The choice of p is 
addressed in Section VII. 

• Single expansion: Only a single expansion is performed from 
each retained nodes at the precedent encoding level. This is done 
by following the decision-feedback equalization (DEE) path. 
At the last search level, the metrics of the obtained perturbed 
vectors Si, §2, • • • , St are compared, and the vector that has the 
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Fig. 2. Ratio between tiie complexities of FSE and QRDIVI-E for several 
K and T values. 



smallest metric is precoded and transmitted. 

Fig. [T] shows an example of the proposed FSE algorithm for 
p = 1, if = 3, and £ {-2, -1, 0, 1, 2}. At the first search 
level, the root node is extended to the five possible combina- 
tions of (si + rife), where k ~ 1, 2, • • • , 5. The metrics of 
the resulting branches are then calculated, and all the branches 
are retained for the next level. At levels 2 to K, only a single 
expansion is performed from each retained node. This is done 
by following the decision-feedback equalization path. At the 
last search level, the vector s that has the lowest accumulative 
metric, indicated by the thick line in Fig. [T] is precoded and 
transmitted. 



V. ANALYSIS ON THE COMPUTATIONAL 
COMPLEXITY 

In this section, we compute the computational complexity of 
the vector-perturbation techniques in terms of the number of vis- 
ited nodes, i.e., number of metric computations. 
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The worst-case complexity of the SE is given by: 

K 
i=l 

rpK+1 _ rp 



T - 1 



(12) 



For high K, the worst-case complexity of the SE becomes high 
and inapplicable. 

To obtain the perturbation vector, the QRDM-E algorithm 
performs 



Cqrdm-e = T+{K- l)T' 



(13) 



metric computations. On the other hand, the proposed FSE only 
requires 

Cfse = KT (14) 

metric computations. This demonstrates that the proposed al- 
gorithm is light and suitable for mobile communication systems 
which are power and latency limited. 

Figure |2] shows the ratio p ~ Cfse / Cqrdm-e versus the real 
space dimension {K = 2 x Nt). In a 4 x 4 system, i.e., K ^ ^, 
and T = 7, FSE requires only 16% of the computational com- 
plexity of the QRDM-E. 

In ifTTI . it has been shown that at the same clock frequency of 
100 MHz, 4x4 system, and 16-QAM, the achieved throughput 
of the FSE and the QRDM-E ai-e 400 Mbps and 53.3 Mbps, re- 
spectivelyQ This shows that the encoding throughput of the FSE 
is 7.5 times that of the conventional QRDM-E. 



VI. PROPOSED COMPLEXITY REDUCTION 
TECHNIQUES FOR FSE 

To reduce the computational complexity of the FSE algo- 
rithm, we propose utilization of pre-computations to obtain the 
elements of L(s + rt), which are accessed via their indices. 
These elements are saved in the matrix A G R^^^^^, where 
U = (Eili i) and D is the size of the real constellation set. 
As these computations are only performed each time the chan- 
nel matrix is updated, the number of multiplication and addition 
operations required for each transmission are given by: 



and 



DTK{K + 1) + 2T - 2 

padd _ {T-l)D 



(15) 



(16) 



where C™' and C^'''' are the required real multiplication and ad- 
dition operations, respectively, for the pre-computation stage. 
Also, Nf is the number of transmissions using the same channel 
state information (CSI). 

At the tree-search phase, instead of comparing the second 
norms of the branch metrics, as in ([8]), we propose to com- 
pare the absolute values of the branch metrics, hereafter referred 

^ In fact, these results are given for the hardware implementation of the FSD 
and QRD-M detection algorithms. Since, the tree search phase is similar for the 
encoding and the detection, we consider these values appropiiate for the purpose 
of comparison. 




Fig. 3. BER performance of the vector perturbation techniques for 
K = 8, several values of T, using QPSK modulation, and at SNR of 

20dB. 



to as the absolute metric. This is done by comparing the ob- 
tained metrics in (O, before the square operation. Therefore, 
the branch with the smallest absolute metric is selected, and its 
accumulative metric is computed, as in (O. As a consequence, 
the number of required multiplication operations at each node is 
reduced from T to one. In the sequel, this technique is referred 
to as comparison-before-squaring strategy. Hence, the multipli- 
cation and addition operations required at the tree search phase 
of the proposed FSE for algorithm {p = 1) are as follows: 



C;f ' = KT 



and 



QT = -^T K{K - 1) + T - 1. 



(17) 



(18) 



Then, the total number of multiplication operations required by 
the proposed FSE algorithm is given by: 



, ^ DTK{K + l) + 2T-2 
2Nf 

and the total number of additions is given by: 



(19) 



C" 



D(T — 1) 1 
^ ^ + -T^K{K -I) +T -1. (20) 
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For large Nf, C™' « C™' and C^ 
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VII. SIMULATION RESULTS AND DISCUSSION 

A. SIMULATIONS WITH PERFECT CSI AT THE TRANSMIT- 
TER 

In this section, we investigate the bit error rate (BER) per- 
formance of the conventional THP scheme, QRDM-E, and the 
proposed FSE in 4 x 4 and 8 x 8 MU-MIMO systems, i.e., for 
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Fig. 4. BER performance of the proposed fixed-complexity sphere 
encoder for T = 3, p = 1 and p = 2ina4x4 and 8x8 systems, using 
QPSK modulation. 



Table 1 . Computational complexity of the vector-perturbation schemes 
in terms of the number of visited nodes. 



System 


QRDM-E 


FSE-pl (T = 9) 


FSE-p2 (T = 3) 


4x4 


576 


72 


66 


8x8 


1224 


144 


138 



K = 8 and K = 16, respectively, using QPSK modulation. The 
conventional THP algorithm is considered as the special case of 
the proposed FSE algorithm when the branch with the smallest 
accumulative metric is the only one retained at each precoding 
level, i.e., if only the decision-feedback equalization path is fol- 
lowed (TT\. In the sequel, the MMSE precoding criterion is used 
due to its superior performance compared with the ZF criterion. 

Figure |3] shows the BER of the FSE and QRDM-E schemes 
for several sizes of the set A at SNR = 20 dB. We remark that 
the best improvement in the performance happens when moving 
from r = 3 to T = 5. For T > 9, no further performance im- 
provement is remarked in the case of the FSE while a small ad- 
ditional improvement is remarked in the case of the QRDM-E. 
Therefore, as a tradeoff between performance and complexity, 
we set r to 9 in the sequel. 

Figure |4] depicts the BER performance of the proposed FSE 
for p ~ I and p = 2, referred to as FSE-pl and FSE-p2 in the 
sequel, in 4 x 4 and 8x8 systems. In 4 x 4 system and for 
T = 3, FSE-pl and FSE-p2 require 24 and 66 metric computa- 
tions, respectively. At target BER of 10^'*, FSE-p2 outperforms 
FSE-pl by 2 and 0.9 dB in 4 x 4 and 8x8 systems, respectively. 
Note that the FSD |8J does not enjoy better performance when 
the complexity is increased at 4 x 4 system, while as it is afore- 
mentioned, the FSE has better performance when moving from 
FSE-pl to FSE-p2. 

Figures|5]and|6]depict the BER performance of the proposed 
FSE compared to those of the conventional algorithms in 4 x 4 
and 8x8 systems, respectively. Table[T]gives the computational 
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Fig. 5. BER performance of the proposed fixed-complexity sphere 
encoder in a 4 x 4 system, and using QPSK modulation. 
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Fig. 6. BER performance of the proposed fixed-complexity sphere 
encoder in a 8 x 8 system, and using QPSK modulation. 

Table 2. Mean and standard deviation of the metrics corresponding to 
the retained candidates at the last encoding level averaged over 
100,000 independent channel realizations. 



System 


FSE-pl (T = 9) 


FSE-p2 (T = 3) 


mean 


std 


mean 


std 


4x4 


40.7 


40.4 


16.7 


13.8 


8x8 


22.2 


15.2 


10.7 


3.9 



complexities of the vector perturbation algorithms. The stated 
results demonstrate the light complexity of the proposed FSE 
compared to the conventional QRDM-E. For instance, FSE-pl 
and FSE-p2 require only 12.5% and 11.46% of the number of 
metric computations performed by the conventional QRDM-E. 
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Fig. 7. Effect of the imperfect CSI on the BER performance of the 
QRDM-E, THP, and the proposed FSE-p2 algorithms in a 4 x 4 system, 
and using QPSK modulation. 



In terms of BER performance, in 4 x 4 system, FSE-p2 outper- 
forms THP technique by 7.4dB while lagging the performance 
of the QRDM-E by 1.3dB at a target BER of 10"^. In 8 x 8 sys- 
tem, FSE-p2 outperforms THP technique by 2.7dB while lag- 
ging the performance of QRDM-E by 1.2dB at the same target 
BER. 

From Figures |5] and |6l it is clear that FSE-p2 outperforms 
FSE-pl although it has lower computational complexity. This is 
because the candidates retained by the FSE-p2 are much closer 
to each other in the Euclidean space. As a consequence, the 
convergence to the optimum candidate is more probable than 
in the case of FSE-pl which leads to more distant candidates 
in the Euclidean space. Table |2] depicts the mean and standard 
deviation of the accumulative metrics corresponding to the re- 
tained vector candidates at the last encoding level. It is evident 
that FSE-p2 leads to both low mean and standard deviation of 
those metrics as compared to the FSE-pl. Hence, we recom- 
mend using FSE-p2 instead of FSE-pl even in low-dimensional 
MU-MIMO systems. 

B. SIMULATIONS WITH IMPERFECT CSI AT THE TRANS- 
MITTER 

In this section we consider that the channel state information 
is not perfectly fed back to the transmitter due to quantization 
error, practical channel estimation, feedback delay, etc. There- 
fore, the channel matrix at the transmitter is defined as 

H = H + B, (21) 

where H is the perfectly estimated channel and B is the error ma- 
trix whose elements follow complex normal distributions with 

zero mean. Herein, we define C = lOlogj^o / |1B||^^ 

which is a measure of the amount of error 

For sake of simplicity in the derivation, we consider that 



a — > and Nt = Nu, then the power of the error in the pre- 
coding matrix can be upper bounded by 

where (7; (H) is the i-th singular value of the H matrix. A de- 
tailed derivation of ( l22l i is given in the Appendix. It is clear 
from {22\ that a small error in the CSI may lead to a large error, 
particularly when the channel matrix is ill-conditioned where 
the first summation of ( l22l i becomes large. 

Figure |2] shows the BER performance of the precoding algo- 
rithms for ( = 25dB in a 4 x 4 system. At a target BER of 10^^, 
a degradation of 1 and 1.5dB are remarked in the BER perfor- 
mance curves of the QRDM-E and the proposed FSE-p2 vector 
perturbation techniques. This degradation is tolerable in practi- 
cal systems compared to the remarkable reduction in the trans- 
mit power achieved by employing the proposed FSE. On the 
other hand, a floor appeared in the performance of the THP al- 
gorithm when C = 25dB. This is because of the insufficiency of 
the number of candidates retained at each encoding level. Also, 
it indicates that the error in the precoding matrix becomes dom- 
inant. 

In Q and ITSl . it has been shown that the QRDME achieves 
the optimum diversity order, i.e., the BER performance curve 
of the QRDME is parallel to that of the optimum encoder In 
this paper, we consider the QRDME as a reference, where our 
proposed algorithm is shown to achieve the same diversity order 
of the QRDME, i.e., they have parallel performance curves and 
therefore our proposed algorithm achieves the optimum diver- 
sity. 

VIII. CONCLUSIONS 

In this paper, we proposed a fixed-complexity sphere en- 
coder (FSE) for MU-MIMO systems. Unlike the conventional 
SE scheme, which has a random complexity and a sequential 
structure, the proposed FSE has a fixed complexity and a paral- 
lel tree-search structure, leading to higher efficiency for hard- 
ware implementation. Moreover, the complexity of the FSE 
is analyzed and reduced by the two proposed techniques, viz.; 
pre-computing the frequently used values before the tree-search 
stage, and the comparison-before-squaring strategy which re- 
duces the number of computations per node. Simulation and 
analytical results show that the proposed FSE requires a small 
fraction of the computational complexity and processing time 
of the conventional QRDM-E. This is achieved with a tolerable 
degradation in the BER while achieving the optimum diversity 
order. 

REFERENCES 

[1] H. Weingarten, Y. Steinberg, and S. Sliamai, "Tlie capacity region of the 
Gaussian multiple-input multiple-output broadcast channel," IEEE Trans- 
actions on Inf. Theory, vol. 52, no. 9, pp. 3936-3964, Sep. 2006. 

[2] M. Costa, "Writing on dirty paper," IEEE Transactions on Information 
Theory, vol. IT-29, pp. 439-441, May 1983. 

[3] C. Peel, B. Hochwald, and L. Swindlehurst, "A vector-perturbation tech- 
nique for near-capacity multiantenna multiuser communication - Part I: 
Channel inversion and regularization," IEEE Transactions on Communi- 
cations, vol. 53, no. 1, pp. 195-202, Jan. 2005. 



7 



[4] M. Tomlinson, "New automatic equalizer employing modulo arith- 
metic,"Electmnics Letters, vol. 7, pp. 138-139, Mai'. 1971. 

[5] H. Harashima and H. Miyakawa, "Matched-transmission technique for 
channels with intersymbol interference," IEEE Transactions on Commu- 
nications, vol. 20, no. 4, pp. 774-780, Aug. 1972. 

[6] B. Hochwald, C. Peel, and L. Swindlehurst, "A vector-perturbation tech- 
nique for near-capacity multiantenna multiuser communication - Part II: 
Perturbation," IEEE Transactions on Communications, vol. 53, no. 3, pp. 
537-544, Mar. 2005. 

[7] J.Z. Zhang and K.J. Kim, "Near-capacity MIMO multiuser precoding with 
QRD-M algorithm," in Proc. oflEEEACSSC, Nov. 2005, pp. 1498-1502. 

[8] L. Barbero and J. Thompson, "Performance analysis of a fixed-complexity 
sphere decoder in high-dimensional MIMO systems," in Proc. IEEE Inter- 
national Conference on Acoustics, Speech, and Signal Processing, May 
2006, pp. 557-560. 

[9] R. Habendoif and G. Fettweis, "On ordering optimization for MIMO sys- 
tems with decentralized receivers," in Proc. IEEE Vehicular Technology 
Conference, May 2006, pp. 1844-1848. 

[10] D. Wubben, R. Bohnke, V. Kuhn, and K.-D. Kammeyer, "MMSE exten- 
sion of V-BLAST based on sorted QR decomposition," in Proc. IEEE Ve- 
hicular Telecomm. Conf, Oct. 2003, pp. 508-512. 

[II] L. Barbero and J. Thompson, "Rapid prototyping of the sphere decoder 
for MIMO systems" in Proc. of 2nd lEE/EURASIP Conference on DSP 
enabled Radio, 2005. 

[12] J. Liu, and W. Krzymien, "Improved Tomlinson-Harashima precoding for 
the downlink for multi-user MIMO systems," Canadian Journal of Elec- 
trical and Computer Engineering, vol. 32, no. 3, pp. 133-144, Summer 
2007. 

[13] C-B. Chae. S. Shim, and R. Heath, Jr., "Block diagonalized vector per- 
turbation for multiuser MIMO systems," IEEE Transactions on Wireless 
Communications, vol. 7, no. II, pp. 4051-4057, Nov. 2008. 

Appendix 

Let the imperfect channel matrix be given by: 



H = H + B, 



(23) 



where B is the error matrix. Assume that the elements of B are 
small to assure that \\m.i^ao (H^B) — 0, then, based on the 
Neumann series we have: 
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(H + B) 



The first-order approximation of (|24] | is given by 




(24) 
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Therefore, the approximated error in the precoding matrix be- 
comes (^H^^ - H ^) w H"^BH"\ and the Froheni us norm 
of this approximated error matrix is upper-bounded as foUows: 
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